Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings

نویسندگان

Da-Rong Liu

Kuan-Yu Chen

Hung-Yi Lee

Lin-shan Lee

چکیده

Unsupervised discovery of acoustic tokens from audio corpora without annotation and learning vector representations for these tokens have been widely studied. Although these techniques have been shown successful in some applications such as query-byexample Spoken Term Detection (STD), the lack of mapping relationships between these discovered tokens and real phonemes have limited the down-stream applications. This paper represents probably the first attempt towards the goal of completely unsupervised phoneme recognition, or mapping audio signals to phoneme sequences without phoneme-labeled audio data. The basic idea is to cluster the embedded acoustic tokens and learn the mapping between the cluster sequences and the unknown phoneme sequences with a Generative Adversarial Network (GAN). An unsupervised phoneme recognition accuracy of 36% was achieved in the preliminary experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Automatic speech recognition (ASR) has been widely researched with supervised approaches, while many lowresourced languages lack audio-text aligned data, and supervised methods cannot be applied on them. In this work, we propose a framework to achieve unsupervised ASR on a read English speech dataset, where audio and text are unaligned. In the first stage, each word-level audio segment in the u...

متن کامل

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

Learning speaker-specific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speaker-specific characteristics in a completely unsupervised manner from large amounts of unlabeled training data that even contain multispeaker audio streams. The NPC framework expl...

متن کامل

Learning Weakly Supervised Multimodal Phoneme Embeddings

Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips? movements, in a weakly supervised way using Siamese networks and lexical same-different side information. In...

متن کامل

Single-Channel Speech Separation usin Factorizati

We apply machine learning techniques to the problem of separating multiple speech sources from a single microphone recording. The method of choice is a sparse non-negative matrix factorization algorithm, which in an unsupervised manner can learn sparse representations of the data. This is applied to the learning of personalized dictionaries from a speech corpus, which in turn are used to separa...

متن کامل

Single-channel speech separation using sparse non-negative matrix factorization

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings

نویسندگان

چکیده

منابع مشابه

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

Learning Weakly Supervised Multimodal Phoneme Embeddings

Single-Channel Speech Separation usin Factorizati

Single-channel speech separation using sparse non-negative matrix factorization

عنوان ژورنال:

اشتراک گذاری